0 Data

df <- read.csv("csv/20240805基本資料_retrospective.csv",
               na.strings = c("", "NA"),
               fileEncoding = "Big5")

df

0.1 variables

summary(df)
##       編號          性別           慢性病用藥狀況.複選. Comorbidity.多選. 
##  Min.   :   1   Length:6345        Length:6345          Length:6345       
##  1st Qu.:1726   Class :character   Class :character     Class :character  
##  Median :3634   Mode  :character   Mode  :character     Mode  :character  
##  Mean   :3548                                                             
##  3rd Qu.:5279                                                             
##  Max.   :6971                                                             
##  NA's   :1                                                                
##    醫院代碼         術前主述.複選.     Biopsy.date.確診日期.
##  Length:6345        Length:6345        Length:6345          
##  Class :character   Class :character   Class :character     
##  Mode  :character   Mode  :character   Mode  :character     
##                                                             
##                                                             
##                                                             
##                                                             
##  Biopsy.method.複選.  Cell.Type            多發性          切片檢體腫瘤惡性度
##  Length:6345         Length:6345        Length:6345        Length:6345       
##  Class :character    Class :character   Class :character   Class :character  
##  Mode  :character    Mode  :character   Mode  :character   Mode  :character  
##                                                                              
##                                                                              
##                                                                              
##                                                                              
##    病理分期         危險因子..複選.    合併膀胱腫瘤         合併CIS         
##  Length:6345        Length:6345        Length:6345        Length:6345       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  surgical.margin       左右側          腫瘤位置.多選.       腫瘤大小        
##  Length:6345        Length:6345        Length:6345        Length:6345       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  完整檢體腫瘤惡性度 pathological.stage   術前水腎        
##  Length:6345        Length:6345        Length:6345       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  Lymphovascular.invasion Tumor.Necrosis     有無針對UTUC化療  
##  Length:6345             Length:6345        Length:6345       
##  Class :character        Class :character   Class :character  
##  Mode  :character        Mode  :character   Mode  :character  
##                                                               
##                                                               
##                                                               
##                                                               
##  針對UTUC化療型態     化療處方          NxUx.date        
##  Length:6345        Length:6345        Length:6345       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  NxUx.Access.method..複選. Bladder.cuff.resection other.bladder.cuff.method
##  Length:6345               Length:6345            Length:6345              
##  Class :character          Class :character       Class :character         
##  Mode  :character          Mode  :character       Mode  :character         
##                                                                            
##                                                                            
##                                                                            
##                                                                            
##  Lymphadenectomy.位置.複選. simultaneously.ipsilateral.adrenalectomy
##  Length:6345                Length:6345                             
##  Class :character           Class :character                        
##  Mode  :character           Mode  :character                        
##                                                                     
##                                                                     
##                                                                     
##                                                                     
##  Post.operation.intravesical.C.T.instillation Endoscopic.resection.date
##  Length:6345                                  Length:6345              
##  Class :character                             Class :character         
##  Mode  :character                             Mode  :character         
##                                                                        
##                                                                        
##                                                                        
##                                                                        
##  Endoscopic.Access.method Endoscopic.Energy.device..複選.
##  Length:6345              Length:6345                    
##  Class :character         Class :character               
##  Mode  :character         Mode  :character               
##                                                          
##                                                          
##                                                          
##                                                          
##  endoscopic.ablation.次數 post.ablation.Intra.cavitary.C.T
##  Min.   : 0.000           Length:6345                     
##  1st Qu.: 1.000           Class :character                
##  Median : 2.000           Mode  :character                
##  Mean   : 2.416                                           
##  3rd Qu.: 3.000                                           
##  Max.   :15.000                                           
##  NA's   :5915                                             
##  Salvage.Nephroureterectomy date.of.salvage.NU date.of.segmental.resection
##  Length:6345                Length:6345        Length:6345                
##  Class :character           Class :character   Class :character           
##  Mode  :character           Mode  :character   Mode  :character           
##                                                                           
##                                                                           
##                                                                           
##                                                                           
##  Segmental.resection.Access.method..複選. salvage.Nephroureterectomy
##  Length:6345                              Length:6345               
##  Class :character                         Class :character          
##  Mode  :character                         Mode  :character          
##                                                                     
##                                                                     
##                                                                     
##                                                                     
##  Date.of.salvage.NU Clavien.Dindo.classification.複選.
##  Length:6345        Length:6345                       
##  Class :character   Class :character                  
##  Mode  :character   Mode  :character                  
##                                                       
##                                                       
##                                                       
##                                                       
##  complication.list..請填寫手術併發症中英文均可. 術後住院天數..day.
##  Length:6345                                    Min.   : 0.000    
##  Class :character                               1st Qu.: 6.000    
##  Mode  :character                               Median : 7.000    
##                                                 Mean   : 8.541    
##                                                 3rd Qu.: 9.000    
##                                                 Max.   :99.000    
##                                                 NA's   :2021      
##  Residual.bladder.cuff Date.of.last.cystoscopy
##  Length:6345           Length:6345            
##  Class :character      Class :character       
##  Mode  :character      Mode  :character       
##                                               
##                                               
##                                               
##                                               
##  Bladder.UC.after.NUx.or.Endoscopic.or.Segmental.resection
##  Length:6345                                              
##  Class :character                                         
##  Mode  :character                                         
##                                                           
##                                                           
##                                                           
##                                                           
##  Date.of.Bladder.UC.recurrence.after.NUx.or.Endoscopic.or.Segmental.resection
##  Length:6345                                                                 
##  Class :character                                                            
##  Mode  :character                                                            
##                                                                              
##                                                                              
##                                                                              
##                                                                              
##  患側Upper.ureter.or.renal.pelvis.local.recurrence
##  Length:6345                                      
##  Class :character                                 
##  Mode  :character                                 
##                                                   
##                                                   
##                                                   
##                                                   
##  Date.of.Upper.ureter.or.renal.pelvis.recurrence
##  Length:6345                                    
##  Class :character                               
##  Mode  :character                               
##                                                 
##                                                 
##                                                 
##                                                 
##  患側Lower.ureter.or.bladder.cuff..local.recurrence
##  Length:6345                                       
##  Class :character                                  
##  Mode  :character                                  
##                                                    
##                                                    
##                                                    
##                                                    
##  Date.of.Lower.ureter.or.bladder.cuff.recurrence 淋巴轉移及位置..複選.
##  Length:6345                                     Length:6345          
##  Class :character                                Class :character     
##  Mode  :character                                Mode  :character     
##                                                                       
##                                                                       
##                                                                       
##                                                                       
##  Date.of.LN.mets    遠端轉移.複選.     Date.of.distant.mets
##  Length:6345        Length:6345        Length:6345         
##  Class :character   Class :character   Class :character    
##  Mode  :character   Mode  :character   Mode  :character    
##                                                            
##                                                            
##                                                            
##                                                            
##  Disease.free.註.不含膀胱內復發.  Mortality         Date.of.mortality 
##  Length:6345                     Length:6345        Length:6345       
##  Class :character                Class :character   Class :character  
##  Mode  :character                Mode  :character   Mode  :character  
##                                                                       
##                                                                       
##                                                                       
##                                                                       
##  長期.Complication  手術到死亡間隔時間..月. Post.OP.1.month.eGFR
##  Length:6345        Min.   :     -51        Min.   : 0.3898     
##  Class :character   1st Qu.:      11        1st Qu.:57.5790     
##  Mode  :character   Median :      29        Median :    Inf     
##                     Mean   :    6254        Mean   :    Inf     
##                     3rd Qu.:      66        3rd Qu.:    Inf     
##                     Max.   :16708534        Max.   :    Inf     
##                     NA's   :3653            NA's   :79          
##    last.eGFR     Lost.follow.up     Longest.follow.up.month.for.BDFS.DFS
##  Min.   : 0.00   Length:6345        Min.   : -1.61                      
##  1st Qu.:24.80   Class :character   1st Qu.: 12.42                      
##  Median :54.41   Mode  :character   Median : 31.93                      
##  Mean   :  Inf                      Mean   : 43.21                      
##  3rd Qu.:  Inf                      3rd Qu.: 62.53                      
##  Max.   :  Inf                      Max.   :275.59                      
##  NA's   :43                         NA's   :326                         
##  Longest.follow.up.month.for.OS.CSS 死亡檔最後追蹤日期 死亡檔最長追蹤時間.month
##  Min.   :   0.00                    Length:6345        Min.   : -51.35         
##  1st Qu.:  20.86                    Class :character   1st Qu.:  21.48         
##  Median :  51.41                    Mode  :character   Median :  53.43         
##  Mean   :  65.76                                       Mean   :  66.14         
##  3rd Qu.:  95.27                                       3rd Qu.:  94.84         
##  Max.   :2023.10                                       Max.   :2023.10         
##  NA's   :52                                            NA's   :1001            
##      ECOG             ASA.score     身高..公分.         體重..公斤.    
##  Length:6345        Min.   :1.000   Length:6345        Min.   : 17.20  
##  Class :character   1st Qu.:2.000   Class :character   1st Qu.: 52.00  
##  Mode  :character   Median :3.000   Mode  :character   Median : 60.00  
##                     Mean   :2.581                      Mean   : 61.07  
##                     3rd Qu.:3.000                      3rd Qu.: 68.00  
##                     Max.   :4.000                      Max.   :646.00  
##                     NA's   :2322                       NA's   :1858    
##      生日              診斷年紀      術前Cr.level..mg.dl.
##  Length:6345        Min.   :  8.00   Length:6345         
##  Class :character   1st Qu.: 61.33   Class :character    
##  Mode  :character   Median : 69.18   Mode  :character    
##                     Mean   : 68.31                       
##                     3rd Qu.: 76.03                       
##                     Max.   :101.72                       
##                     NA's   :49

0.2 variable names

colnames(df)
##  [1] "編號"                                                                        
##  [2] "性別"                                                                        
##  [3] "慢性病用藥狀況.複選."                                                        
##  [4] "Comorbidity.多選."                                                           
##  [5] "醫院代碼"                                                                    
##  [6] "術前主述.複選."                                                              
##  [7] "Biopsy.date.確診日期."                                                       
##  [8] "Biopsy.method.複選."                                                         
##  [9] "Cell.Type"                                                                   
## [10] "多發性"                                                                      
## [11] "切片檢體腫瘤惡性度"                                                          
## [12] "病理分期"                                                                    
## [13] "危險因子..複選."                                                             
## [14] "合併膀胱腫瘤"                                                                
## [15] "合併CIS"                                                                     
## [16] "surgical.margin"                                                             
## [17] "左右側"                                                                      
## [18] "腫瘤位置.多選."                                                              
## [19] "腫瘤大小"                                                                    
## [20] "完整檢體腫瘤惡性度"                                                          
## [21] "pathological.stage"                                                          
## [22] "術前水腎"                                                                    
## [23] "Lymphovascular.invasion"                                                     
## [24] "Tumor.Necrosis"                                                              
## [25] "有無針對UTUC化療"                                                            
## [26] "針對UTUC化療型態"                                                            
## [27] "化療處方"                                                                    
## [28] "NxUx.date"                                                                   
## [29] "NxUx.Access.method..複選."                                                   
## [30] "Bladder.cuff.resection"                                                      
## [31] "other.bladder.cuff.method"                                                   
## [32] "Lymphadenectomy.位置.複選."                                                  
## [33] "simultaneously.ipsilateral.adrenalectomy"                                    
## [34] "Post.operation.intravesical.C.T.instillation"                                
## [35] "Endoscopic.resection.date"                                                   
## [36] "Endoscopic.Access.method"                                                    
## [37] "Endoscopic.Energy.device..複選."                                             
## [38] "endoscopic.ablation.次數"                                                    
## [39] "post.ablation.Intra.cavitary.C.T"                                            
## [40] "Salvage.Nephroureterectomy"                                                  
## [41] "date.of.salvage.NU"                                                          
## [42] "date.of.segmental.resection"                                                 
## [43] "Segmental.resection.Access.method..複選."                                    
## [44] "salvage.Nephroureterectomy"                                                  
## [45] "Date.of.salvage.NU"                                                          
## [46] "Clavien.Dindo.classification.複選."                                          
## [47] "complication.list..請填寫手術併發症中英文均可."                              
## [48] "術後住院天數..day."                                                          
## [49] "Residual.bladder.cuff"                                                       
## [50] "Date.of.last.cystoscopy"                                                     
## [51] "Bladder.UC.after.NUx.or.Endoscopic.or.Segmental.resection"                   
## [52] "Date.of.Bladder.UC.recurrence.after.NUx.or.Endoscopic.or.Segmental.resection"
## [53] "患側Upper.ureter.or.renal.pelvis.local.recurrence"                           
## [54] "Date.of.Upper.ureter.or.renal.pelvis.recurrence"                             
## [55] "患側Lower.ureter.or.bladder.cuff..local.recurrence"                          
## [56] "Date.of.Lower.ureter.or.bladder.cuff.recurrence"                             
## [57] "淋巴轉移及位置..複選."                                                       
## [58] "Date.of.LN.mets"                                                             
## [59] "遠端轉移.複選."                                                              
## [60] "Date.of.distant.mets"                                                        
## [61] "Disease.free.註.不含膀胱內復發."                                             
## [62] "Mortality"                                                                   
## [63] "Date.of.mortality"                                                           
## [64] "長期.Complication"                                                           
## [65] "手術到死亡間隔時間..月."                                                     
## [66] "Post.OP.1.month.eGFR"                                                        
## [67] "last.eGFR"                                                                   
## [68] "Lost.follow.up"                                                              
## [69] "Longest.follow.up.month.for.BDFS.DFS"                                        
## [70] "Longest.follow.up.month.for.OS.CSS"                                          
## [71] "死亡檔最後追蹤日期"                                                          
## [72] "死亡檔最長追蹤時間.month"                                                    
## [73] "ECOG"                                                                        
## [74] "ASA.score"                                                                   
## [75] "身高..公分."                                                                 
## [76] "體重..公斤."                                                                 
## [77] "生日"                                                                        
## [78] "診斷年紀"                                                                    
## [79] "術前Cr.level..mg.dl."

0.3 rename

colnames(df) <- c("編號", "性別", "慢性病用藥狀況", "Comorbidity", "醫院代碼", "術前主述", "Biopsy_date_確診日期", "Biopsy_method", "Cell_Type", "多發性", "切片檢體腫瘤惡性度", "病理分期", "危險因子", "合併膀胱腫瘤", "合併CIS", "surgical_margin", "左右側", "腫瘤位置", "腫瘤大小", "完整檢體腫瘤惡性度", "pathological_stage", "術前水腎", "Lymphovascular_invasion", "Tumor_Necrosis", "有無針對UTUC化療", "針對UTUC化療型態", "化療處方", "NxUx_date", "NxUx_Access_method", "Bladder_cuff_resection", "other_bladder_cuff_method", "Lymphadenectomy_位置", "simultaneously_ipsilateral_adrenalectomy", "Post_operation_intravesical_CT_instillation", "Endoscopic_resection_date", "Endoscopic_Access_method", "Endoscopic_Energy_device", "endoscopic_ablation_次數", "post_ablation_Intra_cavitary_CT", "Salvage_Nephroureterectomy", "date_of_salvage_NU", "date_of_segmental_resection", "Segmental_resection_Access_method", "salvage_Nephroureterectomy", "Date_of_salvage_NU", "Clavien_Dindo_classification", "complication_list", "術後住院天數_天", "Residual_bladder_cuff", "Date_of_last_cystoscopy", "Bladder_UC_after_NUx_or_Endoscopic_or_Segmental_resection", "Date_of_Bladder_UC_recurrence_after_NUx_or_Endoscopic_or_Segmental_resection", "患側Upper_ureter_or_renal_pelvis_local_recurrence", "Date_of_Upper_ureter_or_renal_pelvis_recurrence", "患側Lower_ureter_or_bladder_cuff_local_recurrence", "Date_of_Lower_ureter_or_bladder_cuff_recurrence", "淋巴轉移及位置", "Date_of_LN_mets", "遠端轉移", "Date_of_distant_mets", "Disease_free", "Mortality", "Date_of_mortality", "長期_Complication", "手術到死亡間隔時間_月", "Post_OP_1_month_eGFR", "last_eGFR", "Lost_follow_up", "Longest_follow_up_month_for_BDFS_DFS", "Longest_follow_up_month_for_OS_CSS", "死亡檔最後追蹤日期", "死亡檔最長追蹤時間_月", "ECOG", "ASA", "身高", "體重", "生日", "診斷年紀", "術前Cr_level_mg_dl")

0.4 drop missing rows

n_missing <- c()

for(i in 1:6345){
  if(sum(1*is.na(df[i,]))>50){
    n_missing <- c(n_missing, i)
  }
}


df1 <- df[-n_missing,]

1 Variable Selection

1.1 pckage

library(dplyr)

1.2 select()

df1 <- df1%>%
  dplyr::select(編號, 性別, ECOG, 身高, 體重, 生日, Comorbidity, 腫瘤位置, 腫瘤大小, pathological_stage, Mortality, Date_of_mortality, 術前Cr_level_mg_dl, Post_OP_1_month_eGFR, 死亡檔最長追蹤時間_月)

df1

2 Data Type

2.1 factors

df1 <- df1%>%
  transform(性別 = as.factor(性別))%>%
  transform(ECOG = as.factor(ECOG))%>%
  transform(Comorbidity = as.factor(Comorbidity))%>%
  transform(腫瘤位置 = as.factor(腫瘤位置))%>%
  transform(腫瘤大小 = as.factor(腫瘤大小))%>%
  transform(pathological_stage = as.factor(pathological_stage))%>%
  transform(Mortality = as.factor(Mortality))

df1%>%
  dplyr::select(性別, ECOG, Comorbidity, 腫瘤位置, pathological_stage, Mortality)

2.2 numeric

df1 <- df1%>%
  transform(身高 = as.numeric(身高))%>%
  transform(體重 = as.numeric(體重))%>%
  transform(術前Cr_level_mg_dl = as.numeric(術前Cr_level_mg_dl))%>%
  transform(Post_OP_1_month_eGFR = as.numeric(Post_OP_1_month_eGFR))%>%
  transform(死亡檔最長追蹤時間_月 = as.numeric(死亡檔最長追蹤時間_月))
## Warning in eval(substitute(list(...)), `_data`, parent.frame()): NAs introduced
## by coercion
## Warning in eval(substitute(list(...)), `_data`, parent.frame()): NAs introduced
## by coercion
df1%>%
  dplyr::select(身高, 體重, 術前Cr_level_mg_dl, Post_OP_1_month_eGFR, 死亡檔最長追蹤時間_月)

2.3 date

df1 <- df1%>%
  transform(生日 = as.Date(生日, "%m/%d/%Y"))%>%
  transform(Date_of_mortality = as.Date(Date_of_mortality, "%m/%d/%Y"))

df1%>%
  select(生日, Date_of_mortality)

2.4 recap

data frame

df1

summary()

summary(df1)
##       編號        性別                               ECOG           身高      
##  Min.   :   1   1 男:2731   0 無症狀                   :2090   Min.   :128.0  
##  1st Qu.:1701   2 女:3539   1 有症狀,可步行對生活不影響:1799   1st Qu.:152.0  
##  Median :3604               2 躺在床上時間小於50%      : 324   Median :158.0  
##  Mean   :3518               3 躺在床上時間大於50%      :  48   Mean   :158.1  
##  3rd Qu.:5232               4 完全臥床                 :  15   3rd Qu.:164.0  
##  Max.   :6961               NA's                       :1994   Max.   :189.0  
##                                                                NA's   :1825   
##       體重             生日           
##  Min.   : 17.20   Min.   :1907-10-01  
##  1st Qu.: 52.00   1st Qu.:1936-07-01  
##  Median : 60.00   Median :1943-11-19  
##  Mean   : 61.08   Mean   :1944-07-17  
##  3rd Qu.: 68.00   3rd Qu.:1952-04-15  
##  Max.   :646.00   Max.   :1992-12-20  
##  NA's   :1797     NA's   :621         
##                              Comorbidity                 腫瘤位置   
##  0 none                            :1248   1 腎盂            :2564  
##  5 HTN                             : 744   4 下輸尿管        : 808  
##  5 HTN, 10 DM                      : 411   2 上輸尿管        : 653  
##  10 DM                             : 201   3 中輸尿管        : 520  
##  16 malignancy (非UTUC/ bladder UC): 144   1 腎盂, 2 上輸尿管: 511  
##  (Other)                           :3062   (Other)           :1210  
##  NA's                              : 460   NA's              :   4  
##               腫瘤大小       pathological_stage              Mortality   
##  4 ? 3cm          :2238   4 pT3, 9 pNx:1035     0 no              :2781  
##  3 ?2 & &#60; 3 cm: 552   2 pT1, 9 pNx:1027     1 UTUC related    :1244  
##  2 ?1 & &#60; 2 cm: 519   3 pT2, 9 pNx: 743     2 non-UTUC related:1218  
##  3 ?2 & < 3 cm    : 505   9 pNx, 1 pTa: 655     3 Nonknown        : 567  
##  2 ?1 & < 2 cm    : 497   4 pT3, 6 pN0: 333     3 死因不明        : 259  
##  (Other)          : 910   (Other)     :1697     (Other)           : 179  
##  NA's             :1049   NA's        : 780     NA's              :  22  
##  Date_of_mortality    術前Cr_level_mg_dl Post_OP_1_month_eGFR
##  Min.   :1989-02-10   Min.   : 0.190     Min.   : 0.3898     
##  1st Qu.:2012-03-22   1st Qu.: 1.000     1st Qu.:57.1777     
##  Median :2016-04-03   Median : 1.350     Median :    Inf     
##  Mean   :2014-09-08   Mean   : 2.361     Mean   :    Inf     
##  3rd Qu.:2019-01-14   3rd Qu.: 2.100     3rd Qu.:    Inf     
##  Max.   :2024-06-12   Max.   :21.860     Max.   :    Inf     
##  NA's   :3583         NA's   :565        NA's   :77          
##  死亡檔最長追蹤時間_月
##  Min.   : -51.35      
##  1st Qu.:  21.53      
##  Median :  53.50      
##  Mean   :  66.16      
##  3rd Qu.:  94.81      
##  Max.   :2023.10      
##  NA's   :932